HSS8005
  • Module plan
  • Materials
  • Resources
  • Data
  • Assessment
  • Canvas
  1. Week 4
  2. Notes
  • Weekly materials

  • Week 1
    • Slides
    • Notes
    • Exercises
    • Handout
  • Week 2
    • Slides
    • Notes
    • Exercises
    • Handout
  • Week 3
    • Slides
    • Notes
    • Exercises
    • Handout
  • Week 4
    • Slides
    • Notes
    • Exercises
    • Handout
  • Week 5
    • Slides
    • Notes
    • Exercises
    • Handout
  • Week 6
    • Slides
    • Notes
    • Exercises
    • Handout
  • Week 7
    • Slides
    • Notes
    • Exercises
    • Handout
  • Week 8
    • Slides
    • Notes
    • Exercises
    • Handout

On this page

  • Aims
  • R packages
  • Introduction
  • Are preferences for redistribution associated with education in European countries, and is the association moderated by social trust?
    • Data management
    • Relationships between the main variables of interest
    • Multiple regression model without interaction terms
    • Model with an interaction between education and social trust
    • Interpreting interactions
  • Do educational reforms have different effects by levels of parental education?
    • Data management
    • Model 1 and 2 (redux)
  1. Week 4
  2. Notes

Week 4 Worksheet Notes

Interactions: Estimating, graphing and interpreting interaction effects

Aims

By the end of the session you will learn how to:

  • fit, visualise and interpret results from regression models that include interaction terms

R packages

Code
library(tidyverse)
library(easystats)
library(gtsummary)
library(ggformula)
library(sjlabelled)
library(ggeffects)
library(marginaleffects)

Introduction

This session explores examples of interactions in regression models. We will look more closely at some of the multiple regression models we have fit in previous labs and ask whether the effects of core explanatory variables could be said to depend on (or are conditioned on) the values of other explanatory variables included in the model. In practice, this will involve including the product of two explanatory variables in the model. Thus, taking a regression model of the generic form \(y = \beta_0 + \beta_1X_1 + \beta_2X_2 + \epsilon\), we ask whether there is an interaction between variables \(X_1\) and \(X_2\) (i.e. whether the effect of one depends on the values of the other) by fitting a model of the form \(y = \beta_0 + \beta_1X_1 + \beta_2X_2 + \color{green}{\beta_3(X_1 \times X_2)} + \epsilon\), where we are interested in the “interaction effect” represented by \(\beta_3\).

While fitting such interaction models is simple in practice, understanding when they are needed, interpreting, and communicating their results can be challenging, and a growing literature in the social sciences has explored best practices in relation to such models (Berry, Golder, and Milton 2012; Brambor, Clark, and Golder 2006; Clark and Golder 2023; Hainmueller, Mummolo, and Xu 2019). Advancements in software have also made it easier to interpret and present results from interaction models, with some notable R packages entering this space over the past few years; we have already encountered the {marginaleffects} and {ggeffects} packages when visualising results from logistic regression models, and examining results from interaction models presents very similar challenges. In this week’s lab we will use some of these tools to undertake comprehensive analyses of interaction effects.

We have already encountered interactions in all of the application readings we have engaged with so far. However, they mostly involved more complex cases of interactions, whose understanding will first require a more basic knowledge of how interaction effects work and how they can be implemented in practice. Wu (2021) and Dingemans and Van Ingen (2015) make use of “cross-level” interactions, which we will revisit when learning about multilevel models (Week 6). Mitchell (2021) uses higher-level interactions on data aggregated at country level as an additional analysis to their main models, while Österman (2021) operates with a quasi-experimental design in which the interaction effect is taken to divulge a more directly causal effect the data. This latter example is “more complex” only at a conceptual and study design level, and we will replicate it in the lab exercises. However, we begin by looking at a conceptually less involved example where interaction effects are at the core of the analysis, also from the field of social trust research.

Are preferences for redistribution associated with education in European countries, and is the association moderated by social trust?

The research question in the title of this exercise relates directly to the research reported by Akaeda (2023). Following a detailed review of the literature on the relationship between attitudes towards redistribution, education, and social- and institutional trust, the author derives a hypothesis they want to investigate: “trust decreases the gap in preferences for redistribution due to education” (p. 296). They break down the hypothesis into two parts, one relating to social trust and the other to institutional trust.

Akaeda (2023) combine data from several waves of the World Values Survey (WVS) and the European Values Study (EVS) to test this research hypothesis. Their combined dataset contains data on “74 countries, 26 years, 259 country-years, and 254,214 individuals”. Their interest is in modelling the hypothesised relationship on a global level, accounting both for various individual explanatory variables as well as macro-social factors at the level of countries and country-years (i.e. change over time within countries).

Here we will attempt to fit a smaller version of their “Model 3” (reported in Table 2). In terms of data, we focus specifically on “European” countries; for this purpose, we will use data from the latest wave of the European Values Study survey. In terms of variable selection and transformations, we will roughly follow Akaeda (2023), with some adjustments for convenience. Akaeda (2023:297–98) describe their variable selection procedure in these words:

  • the dependent variable is the score of preferences for redistribution. This score is based on a question … that asks respondents to indicate on a scale from 1 to 10 whether ‘Incomes should be made more equal (1)’ or ‘We need larger income differences as incentives for individual effort (10)’. In accordance with previous research, the scores were reversed such that a higher score indicates stronger support for redistribution;
  • the level of education is a key independent variable because the association between education and support for redistribution is a main focus of this study. Because previous studies have shed light on the mechanisms of university education related to a conservative view of redistribution, this analysis adopts the dummy for university or higher degree as an independent variable;
  • as a key moderator variable … social trust as measured by the question, ‘Generally speaking, would you say that most people can be trusted or that you need to be very careful in dealing with people?’, which had the following two potential responses: ‘1. Most people can be trusted’, ‘2. Need to be very careful’. Based on previous research involving social trust, this analysis employs the dummy for social trust (1 = ‘Most people can be trusted’);
  • the following individual-level controls: gender (1 = female, 0= male), age, age squared, employment status (employed, self-employed, unemployed, retired, other), household income (z scored for country-year units, marital status (married = 1), child (having child = 1), and political orientation (1 = right to 10 = left)

Data management

The original data is freely accessible from the European Values Study website in various formats. For the purposes of this lab, we can also access a lightly edited dataset in native .Rds format from the Data page (evs2017.rds). We can either download the dataset to a local data folder and load it from there, or loaded directly from the web:

Code
evs <- datawizard::data_read("https://cgmoreh.github.io/HSS8005-24/data/evs2017.rds")
Reading data...
Variables where all values have associated labels are now converted into
  factors. If this is not intended, use `convert_factors = FALSE`.
210 out of 266 variables were fully labelled and converted into factors.
Following 8 variables are empty:
  v72_DE, v73_DE, v74_DE, v75_DE, v76_DE, v77_DE, v78_DE and v79_DE
  
Use `remove_empty_columns()` to remove them from the data frame.

One the data is loaded in the workspace environment, we can select the variables that we are interested in, inspect them in a codebook and apply any necessary transformations:

Code
# Select the variables needed:

evs <- evs |> 
  select(v106,      # redistribution pref.
         v243_r,    # education
         v31,       # social trust
         v225,      # sex/gender
         age,
         v244,      # employment status
         v261_ppp,  # household income (corrected)
         v234,      # marital status
         v239_r,    # number of children
         v102,      # political orientation (1=left...10=right)
         )


# `View` the codebook for the selected variables:

data_codebook(evs) |> View()


# Recode variables to match Akaeda (2023) as closely as possible:

akaeda <- evs |> 
  data_modify(redistribute = reverse(v106),
              # recode education to a binary; 
              # treat the original variable as a numeric variable in the process to allow referring to numeric codes rather than the long text labels
              educ_univ = recode_values(as_numeric(v243_r),
                                   recode = list("1" = 3,
                                                 "0" = c(1, 2, 4))),
              # inverse to make "most people can be trusted" the second (indicator) category
              trusting = recode_values(as_numeric(v31),
                                       recode = list("1" = 1,
                                                     "0" = 2)),
              # just rename as "female" is already the second category
              female = v225,
              # recode "employment" as a factor/categorical (without treating it as numeric and adding labels later)
              employment = recode_values(v244,
                                         recode = list("Employed" = c("30h a week or more", 
                                                                      "less then 30h a week"), 
                                                       "Self-employed" = "self employed", 
                                                       "Unemployed" = "unemployed", 
                                                       "Retired" = "retired/pensioned", 
                                                       "Other" = c("military service", 
                                                                   "homemaker not otherwise employed", 
                                                                   "student", 
                                                                   "disabled", 
                                                                   "other"))),
              hh_income = v261_ppp,
              # recoding marital status as numeric
              married = recode_values(as_numeric(v234),
                                      recode = list("1" = 1,
                                                    "0" = 2:6)),
              
              child = recode_values(v239_r,
                                    recode = list("0" = 1,
                                                  "1" = 2:6)),
              politics = reverse(v102)
              ) |> 
  set_labels(married, labels = c("Not married", "Maried")) |> 
  set_labels(trusting, labels = c("Not trusting", "Trusting")) |> 
  set_labels(educ_univ, labels = c("No university education", "University education")) |> 
  set_labels(child, labels = c("No children", "Has children")) |> 
  # Treat them as factors again for better reporting of labels
  to_label(married, trusting, educ_univ, child, married)

# `View` the codebook for the recoded dataset:

data_codebook(akaeda) |> View()

Relationships between the main variables of interest

Following Akaeda (2023), the main variables of interest are redistribute, educ_univ and trusting. We can inspect some basic relationships between them using boxplots and cross-tabulations:

Code
# Boxplots:

gf_boxplot(redistribute ~ educ_univ, data = akaeda)
Warning: Removed 1335 rows containing non-finite values (`stat_boxplot()`).

Code
gf_boxplot(redistribute ~ trusting, data = akaeda)
Warning: Removed 1335 rows containing non-finite values (`stat_boxplot()`).

Code
# Cross-tabulations:

# To check the values of a categorical variable by another categorical variable, a contingency table (cross-tabulation) is preferable.
# Native R is not very good at cross-tabulations; there are some better user-written functions, but only relying on packages we have already used so far,
# some options are:

## In base R (table):
### Simple frequencies
table(akaeda$educ_univ, akaeda$trusting)
                         
                          Not trusting Trusting
  No university education        26838    11319
  University education           10153     9416
Code
### Frequencies with added row marginal totals
table(akaeda$educ_univ, akaeda$trusting) |> addmargins(margin = 2)
                         
                          Not trusting Trusting   Sum
  No university education        26838    11319 38157
  University education           10153     9416 19569
Code
### Proportions, by row
table(akaeda$educ_univ, akaeda$trusting) |> prop.table(margin = 1)
                         
                          Not trusting  Trusting
  No university education    0.7033572 0.2966428
  University education       0.5188308 0.4811692
Code
### Proportions, by column with added column marginal totals
table(akaeda$educ_univ, akaeda$trusting) |> prop.table(margin = 2) |> addmargins(margin = 1)
                         
                          Not trusting  Trusting
  No university education    0.7255278 0.5458886
  University education       0.2744722 0.4541114
  Sum                        1.0000000 1.0000000
Code
## In base R (xtabs - formula style):
xtabs(~ educ_univ + trusting, data = akaeda)
                         trusting
educ_univ                 Not trusting Trusting
  No university education        26838    11319
  University education           10153     9416
Code
#### Proportions by column, rounded to 2 decimals, added column marginal totals
xtabs(~ educ_univ + trusting, data = akaeda)|> prop.table(margin = 2) |> round(2) |> addmargins(margin = 1)
                         trusting
educ_univ                 Not trusting Trusting
  No university education         0.73     0.55
  University education            0.27     0.45
  Sum                             1.00     1.00
Code
## In tidyverse (dplyr + tidyr)
### Frequencies
akaeda |> 
  group_by(educ_univ, trusting) |> 
  summarise(n=n()) |> 
  spread(trusting, n)
`summarise()` has grouped output by 'educ_univ'. You can override using the
`.groups` argument.
# A tibble: 3 × 4
# Groups:   educ_univ [3]
  educ_univ               `Not trusting` Trusting `<NA>`
  <fct>                            <int>    <int>  <int>
1 No university education          26838    11319    890
2 University education             10153     9416    434
3 <NA>                               239      118     31
Code
### Proportions
akaeda |> 
  group_by(educ_univ, trusting) |> 
  summarise(n=n()) |> 
  mutate(prop=n/sum(n)) |> 
  subset(select=c("educ_univ","trusting","prop")) |> 
  spread(trusting, prop)
`summarise()` has grouped output by 'educ_univ'. You can override using the
`.groups` argument.
# A tibble: 3 × 4
# Groups:   educ_univ [3]
  educ_univ               `Not trusting` Trusting `<NA>`
  <fct>                            <dbl>    <dbl>  <dbl>
1 No university education          0.687    0.290 0.0228
2 University education             0.508    0.471 0.0217
3 <NA>                             0.616    0.304 0.0799
Code
  # or, for % of education by trust:
  # spread(educ_univ, prop)


## In {datawizard}:
### Both frequencies and % in a listing format rather than crosstabulated
akaeda |> 
  data_group(educ_univ) |>
  data_tabulate(trusting, collapse = TRUE)
# Frequency Table

Variable |                               Group |        Value |     N | Raw % | Valid % | Cumulative %
---------+-------------------------------------+--------------+-------+-------+---------+-------------
trusting | educ_univ (No university education) | Not trusting | 26838 | 68.73 |   70.34 |        70.34
         |                                     |     Trusting | 11319 | 28.99 |   29.66 |       100.00
         |                                     |         <NA> |   890 |  2.28 |    <NA> |         <NA>
---------+-------------------------------------+--------------+-------+-------+---------+-------------
trusting |    educ_univ (University education) | Not trusting | 10153 | 50.76 |   51.88 |        51.88
         |                                     |     Trusting |  9416 | 47.07 |   48.12 |       100.00
         |                                     |         <NA> |   434 |  2.17 |    <NA> |         <NA>
---------+-------------------------------------+--------------+-------+-------+---------+-------------
trusting |                      educ_univ (NA) | Not trusting |   239 | 61.60 |   66.95 |        66.95
         |                                     |     Trusting |   118 | 30.41 |   33.05 |       100.00
         |                                     |         <NA> |    31 |  7.99 |    <NA> |         <NA>
------------------------------------------------------------------------------------------------------
Code
## In {gtsummary}:
akaeda |> 
  gtsummary::tbl_cross(educ_univ, trusting,
                     percent = "row",
                     missing = "no")
FALSE observations with missing data have been removed.
people can be trusted/can’t be too careful (Q7) Total
Not trusting Trusting
educational level respondent: recoded (Q81)


    No university education 26,838 (70%) 11,319 (30%) 38,157 (100%)
    University education 10,153 (52%) 9,416 (48%) 19,569 (100%)
Total 36,991 (64%) 20,735 (36%) 57,726 (100%)

Multiple regression model without interaction terms

Our dependent variable (redistribute) can be treated as numeric, so we will fit a linear (ordinary least squares) regression model using the lm() function:

Code
m1 <- lm(redistribute ~ educ_univ + trusting + female + employment + hh_income + married + child + politics, data = akaeda)

Let’s then check the model coefficients (parameters):

Code
model_parameters(m1)
Parameter                        | Coefficient |       SE |         95% CI | t(40050) |      p
----------------------------------------------------------------------------------------------
(Intercept)                      |        4.05 |     0.05 | [ 3.95,  4.15] |    77.35 | < .001
educ univ [University education] |       -0.50 |     0.03 | [-0.56, -0.44] |   -16.51 | < .001
trusting [Trusting]              |        0.30 |     0.03 | [ 0.25,  0.36] |    10.12 | < .001
female [female]                  |        0.14 |     0.03 | [ 0.09,  0.20] |     5.12 | < .001
employment [Other]               |    5.89e-03 |     0.05 | [-0.09,  0.10] |     0.12 | 0.901 
employment [Retired]             |        0.09 |     0.04 | [ 0.02,  0.16] |     2.62 | 0.009 
employment [Self-employed]       |       -0.21 |     0.06 | [-0.33, -0.09] |    -3.54 | < .001
employment [Unemployed]          |       -0.20 |     0.06 | [-0.31, -0.09] |    -3.58 | < .001
hh income                        |        0.03 | 7.73e-03 | [ 0.01,  0.04] |     3.53 | < .001
married [Maried]                 |       -0.16 |     0.03 | [-0.22, -0.09] |    -4.96 | < .001
child [Has children]             |       -0.12 |     0.04 | [-0.19, -0.05] |    -3.30 | < .001
politics                         |        0.24 | 6.01e-03 | [ 0.22,  0.25] |    39.30 | < .001

Uncertainty intervals (equal-tailed) and p-values (two-tailed) computed
  using a Wald t-distribution approximation.

Model with an interaction between education and social trust

We can interact predictor variables in regression models using the : and * operators. With :, we include in the regression model only the interaction term, but not the component variables (i.e. their main effects); to include the component variables as well, we need to add them in as usual with the + operator. To note that we should normally want to include both main effects and and the interaction effects in a regression model. With *, we include both the main effects and the interaction effects, so it provides a shortcut to using :, but it is often useful to combine the two in more complex interaction scenarios.

The specifications below are equivalent:

Code
# Interaction using ":"
m2 <- lm(redistribute ~ educ_univ + trusting + educ_univ:trusting + female + employment + hh_income + married + child + politics, data = akaeda)

m2 <- lm(redistribute ~ educ_univ*trusting + female + employment + hh_income + married + child + politics, data = akaeda)

m2 <- lm(redistribute ~ educ_univ + trusting + educ_univ*trusting + female + employment + hh_income + married + child + politics, data = akaeda)

Out of precaution, it is advisable to use the * operator and to manually include all the contituent terms as well, as in the third version above.

Let’s then check the model coefficients (parameters):

Code
model_parameters(m2)
Parameter                                              | Coefficient |       SE |         95% CI | t(40049) |      p
--------------------------------------------------------------------------------------------------------------------
(Intercept)                                            |        4.13 |     0.05 | [ 4.02,  4.23] |    77.15 | < .001
educ univ [University education]                       |       -0.68 |     0.04 | [-0.76, -0.60] |   -17.06 | < .001
trusting [Trusting]                                    |        0.15 |     0.04 | [ 0.07,  0.22] |     3.86 | < .001
female [female]                                        |        0.14 |     0.03 | [ 0.09,  0.20] |     5.00 | < .001
employment [Other]                                     |   -1.39e-03 |     0.05 | [-0.09,  0.09] |    -0.03 | 0.976 
employment [Retired]                                   |        0.09 |     0.04 | [ 0.02,  0.16] |     2.47 | 0.013 
employment [Self-employed]                             |       -0.21 |     0.06 | [-0.33, -0.10] |    -3.59 | < .001
employment [Unemployed]                                |       -0.21 |     0.06 | [-0.32, -0.10] |    -3.82 | < .001
hh income                                              |        0.02 | 7.75e-03 | [ 0.01,  0.04] |     2.94 | 0.003 
married [Maried]                                       |       -0.15 |     0.03 | [-0.22, -0.09] |    -4.90 | < .001
child [Has children]                                   |       -0.12 |     0.04 | [-0.20, -0.05] |    -3.48 | < .001
politics                                               |        0.24 | 6.01e-03 | [ 0.22,  0.25] |    39.14 | < .001
educ univ [University education] × trusting [Trusting] |        0.41 |     0.06 | [ 0.29,  0.52] |     6.98 | < .001

Uncertainty intervals (equal-tailed) and p-values (two-tailed) computed
  using a Wald t-distribution approximation.

We can compare our results with those reported in Model 3 (Table 2) of Akaeda (2023), keeping in mind that our data are very different. Nonetheless, we find a similar positive effect for the interaction term (0.41), which in our case is stronger.

Interpreting interactions

As with results from logistic regression models, interpreting interaction results is not straightforward.

First of all, the coefficients associated with the individual variables included in the interaction term can no longer be interpreted in the usual direct way, and we usually no longer interpret the “main effects” of educ_univ and trusting, but only the multiplicative effect of the interaction between them. The interpretation of the numeric results, however, is rather convoluted and prone to lead to misinterpretations. In essence, what we get is the effect of a unit-change in trust among the university-educated compared to the effect of a unit-change in trust among the non-university-educated.

As with interpreting logistic regression, it can be much easier and reliable to interpret a plot of the “marginal effects”. As we already know from Lab 4, that can be achieved using the ggpredict() and the plot() functions, and we would add both constitutive terms of the interaction. Depending on which variable we write first we get either the effect of trust by education, or the effect of education by trust:

Education by trust:

Code
ggpredict(m2, terms = c("educ_univ", "trusting")) |> 
  plot()

Trust by education:

Code
ggpredict(m2, terms = c("trusting", "educ_univ")) |> 
  plot()

The latter plot reproduces Figure 1 from Akaeda (2023). To get an even closer reproduction of that Figure, we could add an additional argument to the plotting function, asking for the point-predictions to be connected with lines:

Code
ggpredict(m2, terms = c("trusting", "educ_univ")) |> 
  plot(connect.lines = TRUE)

Looking at whether and how much the confidence intervals of the predicted coefficients overlap with the point-predictions of the coefficients, we can visually asses the interaction effect. We find that positive differences in the level of social trust have a much steeper effect on redistributive attitudes among the highly educated than it does among the lower educated.

Do educational reforms have different effects by levels of parental education?

This research question is at the core of the study conducted by Österman (2021). We have explored this article and used its data to fit simple binary and multiple regression models in Week 2. We can now take that analysis further and replicate some of the models reported in Table 3 of that article:

Osterman 2021, Table 3

As our main interest is not so much on their substantive findings regarding the complex effects of different types of educational reforms, but rather on seeing interaction effects operating in real-life data, we will focus on Models 1-3 (All reforms). The main difference compared to the previous example is conceptual:. The variable “Reform” is meant to distinguish between respondents who were affected by a major reform of the secondary education system in their countries during their schooling years, or not. This allows the researcher to posit a “quasi-experimental” scenario, in which being exposed to a “reform” can be interpreted in a more causal way, as in an experimental setting. The “reform”, thus, can be thought of as a “treatment” in an experiment, so the examining its statistical effect can be taken to have a more reliably causal interpretation that when exploring associations between purely “descriptive” variables. Technically, however, the models we fit are no different to those in the previous example.

Data management

As a first step, let’s import the osterman dataset that underpins the Österman (2021) article (see the Data page on the course website for information about the datasets available in this course). We can download the data to a local folder and load it from there, or we can load it directly from the web (if it works…):

Code
# Import the data
osterman <- data_read("https://cgmoreh.github.io/HSS8005-24/data/osterman.dta")

# `View` the codebook:
data_codebook(osterman) |> View()

Model 1 and 2 (redux)

We start by fitting a simplified version of Model1 (in Table 3 of Österman (2021)). The reported model contains the variables highlighted in the table (“Reform”, “Female” and “Ethnic minority”) as well as a number of additional covariates for statistical control (fbrneur, mbrneur, fnotbrneur, mnotbrneur, agea, essround, yrbrn, eform_id_num, including interactions between some of those control variables: yrbrn*yrbrn, yrbrn*reform_id_num, agea*reform_id_num, agea*agea, agea*agea*reform_id_num). The model is also adjusted for sampling weights (dweight) and the standard errors are clustered in “country-by-birth year cohort” groups (see Österman (2021)[pp. 222] for their reasoning for this modelling choice; we will discuss this aspect of the data in more detail when we consider multilevel models). We will fit a much simplified model that only includes the few variables highlighted in the table, without any additional control variables, weighting or clustering.

Model 1 doesn’t include an interaction, so it is a simple multiple linear regression model:

Code
osterman_m1 <- lm(trustindex3 ~ reform1_7 + female + blgetmg_d, data = osterman)

# Print model parameters, with digits rounded to three decimal points
model_parameters(osterman_m1, digits = 3)
Parameter   | Coefficient |    SE |           95% CI | t(68792) |      p
------------------------------------------------------------------------
(Intercept) |       5.251 | 0.013 | [ 5.226,  5.276] |  412.067 | < .001
reform1 7   |      -0.008 | 0.015 | [-0.037,  0.020] |   -0.560 | 0.576 
female      |       0.007 | 0.015 | [-0.021,  0.036] |    0.493 | 0.622 
blgetmg d   |      -0.478 | 0.060 | [-0.595, -0.361] |   -8.010 | < .001

Uncertainty intervals (equal-tailed) and p-values (two-tailed) computed
  using a Wald t-distribution approximation.

We find that without the additional model specifications, our results are somewhat different from those reported by the authors (it is, of course, our model that is lacking). The “Reform” effect is much weaker and in the opposite direction, losing its statistical significance (p = 0.576). The effect of “Female” is also weaker and statistically not significant. On the other hand, belonging to an “ethnic minority group” has a stronger and clearer effect in this simplified model.

Model 2 includes two additions: an extra predictor for “High Parental Education”, as well as an interaction effect between “Reform” and “High parental education”. As in the previous example, we can fit the model like so:

Code
osterman_m2 <- lm(trustindex3 ~ reform1_7 + paredu_a_high + reform1_7*paredu_a_high + female + blgetmg_d, data = osterman)

# Print model parameters, with digits rounded to three decimal points
model_parameters(osterman_m2, digits = 3)
Parameter                 | Coefficient |    SE |           95% CI | t(64954) |      p
--------------------------------------------------------------------------------------
(Intercept)               |       5.060 | 0.015 | [ 5.030,  5.089] |  336.557 | < .001
reform1 7                 |       0.032 | 0.018 | [-0.004,  0.068] |    1.765 | 0.078 
paredu a high             |       0.562 | 0.022 | [ 0.520,  0.604] |   26.038 | < .001
female                    |       0.012 | 0.015 | [-0.017,  0.041] |    0.824 | 0.410 
blgetmg d                 |      -0.433 | 0.061 | [-0.553, -0.313] |   -7.068 | < .001
reform1 7 × paredu a high |      -0.138 | 0.031 | [-0.199, -0.077] |   -4.446 | < .001

Uncertainty intervals (equal-tailed) and p-values (two-tailed) computed
  using a Wald t-distribution approximation.

While the results of our Model 2 still differ from those reported based on a more complex model, we find that our estimates are much closer now, with all the coefficients pointing in the same direction as in the reported model, including the interaction effect of “Reform” by “High parental education”. Because the two variables involved in this interaction are both simple binary, indicator, variables (coded 0 and 1) it is possible to make (some) sense of these results. Compared to the “main” effects shown in the previous model, the coefficient for “Reform” now refers to the effect of the reforms among individuals with poorly educated parents (coded 0) only. The interaction coefficient shows, by contrast, the difference in the effect of “Reform” for those with highly educated parents (coded 1). In this case, the negative coefficient shows that in the case of those with highly educated parents, the effect of educational “reforms” is smaller.

We can check more closely the meaning of the interaction effect using a marginal plot:

Code
ggpredict(osterman_m2, terms = c("reform1_7", "paredu_a_high")) |> 
  plot(connect.lines = TRUE)

This visualization in now easier to interpret than the numerical output. a steeper effect among those whose parents were highly educated compared to those whose parents had lower education, however, we can also see more clearly that the effect of educational reforms within the two groups is actually in the opposite direction: among those with lower parental education, educational reforms are associated with higher levels of trust, while among those with lower parental education, the effect is in the opposite direction.

Model 3 is different from Model 2 in that it includes interaction terms not only between “Reform” and “High parental education”, but Parental education is also interacted with all the other covariates that we did not include in our models.

References

Akaeda, Naoki. 2023. “Trust and the Educational Gap in the Demand for Redistribution: Evidence from the World Values Survey and the European Value Study.” International Sociology 38(3):290–310. doi: 10.1177/02685809231167834.
Berry, W. D., M. Golder, and D. Milton. 2012. “Improving Tests of Theories Positing Interaction.” Journal of Politics 74(3):653–71. doi: 10.1017/S0022381612000199.
Brambor, T., W. R. Clark, and M. Golder. 2006. “Understanding Interaction Models: Improving Empirical Analyses.” Political Analysis 14(1):63–82. doi: 10.1093/pan/mpi014.
Clark, William Roberts, and Matt Golder. 2023. Interaction Models: Specification and Interpretation. Cambridge: Cambridge University Press.
Dingemans, Ellen, and Erik Van Ingen. 2015. “Does Religion Breed Trust? A Cross-National Study of the Effects of Religious Involvement, Religious Faith, and Religious Context on Social Trust.” Journal for the Scientific Study of Religion 54(4):739–55. doi: 10.1111/jssr.12217.
Hainmueller, J., J. Mummolo, and Y. Q. Xu. 2019. “How Much Should We Trust Estimates from Multiplicative Interaction Models? Simple Tools to Improve Empirical Practice.” Political Analysis 27(2):163–92. doi: 10.1017/pan.2018.46.
Mitchell, Jeffrey. 2021. “Social Trust and Anti-Immigrant Attitudes in Europe: A Longitudinal Multi-Level Analysis.” Frontiers in Sociology 6.
Österman, Marcus. 2021. “Can We Trust Education for Fostering Trust? Quasi-experimental Evidence on the Effect of Education and Tracking on Social Trust.” Social Indicators Research 154(1):211–33. doi: 10.1007/s11205-020-02529-y.
Wu, Cary. 2021. “Education and Social Trust in Global Perspective.” Sociological Perspectives 64(6):1166–86. doi: 10.1177/0731121421990045.
Week 4
Exercises